Genetic algorithm for text clustering based on latent semantic indexing
نویسندگان
چکیده
منابع مشابه
On Dimensionality of Latent Semantic Indexing for Text Segmentation
In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent s...
متن کاملDouble Clustering in Latent Semantic Indexing
Document clustering is a widely researched area of information retrieval. The large amount of documents which must be handled needs automatic organizing. A popular approach to clustering documents and messages is the vector space model, which represents texts with feature vectors, usually generated from the set of terms contained in the message. The clustering based on the document-term frequen...
متن کاملA Latent Semantic Indexing-based approach to multilingual document clustering
The creation and deployment of knowledge repositories formanaging, sharing, and reusing tacit knowledgewithin an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge...
متن کاملLatent Semantic Indexing Based on Factor Analysis
The main purpose of this paper is to propose a novel latent semantic indexing (LSI), statistical approach to simultaneously mapping documents and terms into a latent semantic space. This approach can index documents more effectively than the vector space model (VSM). Latent semantic indexing (LSI), which is based on singular value decomposition (SVD), and probabilistic latent semantic indexing ...
متن کاملSpam Filtering Based on Latent Semantic Indexing
In this paper, a study on the classification performance of a vector space model (VSM) and of latent semantic indexing (LSI) applied to the task of spam filtering is summarized. Based on a feature set used in the extremely widespread, de-facto standard spam filtering system SpamAssassin, a vector space model and latent semantic indexing are applied for classifying e-mail messages as spam or not...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computers & Mathematics with Applications
سال: 2009
ISSN: 0898-1221
DOI: 10.1016/j.camwa.2008.10.010